Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start migration off defaults to conda-forge channel #4126

Closed
wants to merge 1 commit into from

Conversation

mnorris11
Copy link

Differential Revision: D68043874

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 11, 2025
…#4126)

Summary: Pull Request resolved: facebookresearch#4126

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 24, 2025
Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- - mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```

Unit test notes:
-
- GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 24, 2025
Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- - mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```

Unit test notes:
-
- GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Green build and tests for both build pull request and nightlies: https://github.com/facebookresearch/faiss/actions/runs/12956402963/job/36148818361

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 25, 2025
Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- action.yml mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- meta.yaml mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```
The python_abi was required to be pinned inside these conditions because otherwise several builds got this error:
```
File "/Users/runner/miniconda3/lib/python3.12/site-packages/conda_build/utils.py", line 1919, in insert_variant_versions
        matches = [regex.match(pkg) for pkg in reqs]
                   ^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'list'
```

Unit test notes:
-
- test_gpu_basics.py: GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_contrib.py: this now hangs forever and times out the runner for Windows on Python 3.12. I have it skipping now.
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Green build and tests for both build pull request and nightlies: https://github.com/facebookresearch/faiss/actions/runs/12956402963/job/36148818361

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 27, 2025
Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- action.yml mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- meta.yaml mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```
The python_abi was required to be pinned inside these conditions because otherwise several builds got this error:
```
File "/Users/runner/miniconda3/lib/python3.12/site-packages/conda_build/utils.py", line 1919, in insert_variant_versions
        matches = [regex.match(pkg) for pkg in reqs]
                   ^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'list'
```

Unit test notes:
-
- test_gpu_basics.py: GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_contrib.py: this now hangs forever and times out the runner for Windows on Python 3.12. I have it skipping now.
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Green build and tests for both build pull request and nightlies: https://github.com/facebookresearch/faiss/actions/runs/12956402963/job/36148818361

Reviewed By: asadoughi

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 27, 2025
Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- action.yml mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- meta.yaml mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```
The python_abi was required to be pinned inside these conditions because otherwise several builds got this error:
```
File "/Users/runner/miniconda3/lib/python3.12/site-packages/conda_build/utils.py", line 1919, in insert_variant_versions
        matches = [regex.match(pkg) for pkg in reqs]
                   ^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'list'
```

Unit test notes:
-
- test_gpu_basics.py: GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_contrib.py: this now hangs forever and times out the runner for Windows on Python 3.12. I have it skipping now.
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Green build and tests for both build pull request and nightlies: https://github.com/facebookresearch/faiss/actions/runs/12956402963/job/36148818361

Reviewed By: asadoughi

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 27, 2025
Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- action.yml mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- meta.yaml mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```
The python_abi was required to be pinned inside these conditions because otherwise several builds got this error:
```
File "/Users/runner/miniconda3/lib/python3.12/site-packages/conda_build/utils.py", line 1919, in insert_variant_versions
        matches = [regex.match(pkg) for pkg in reqs]
                   ^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'list'
```

Unit test notes:
-
- test_gpu_basics.py: GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_contrib.py: this now hangs forever and times out the runner for Windows on Python 3.12. I have it skipping now.
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Green build and tests for both build pull request and nightlies: https://github.com/facebookresearch/faiss/actions/runs/12956402963/job/36148818361

Reviewed By: asadoughi

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Jan 27, 2025
Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- action.yml mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- meta.yaml mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```
The python_abi was required to be pinned inside these conditions because otherwise several builds got this error:
```
File "/Users/runner/miniconda3/lib/python3.12/site-packages/conda_build/utils.py", line 1919, in insert_variant_versions
        matches = [regex.match(pkg) for pkg in reqs]
                   ^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'list'
```

Unit test notes:
-
- test_gpu_basics.py: GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_contrib.py: this now hangs forever and times out the runner for Windows on Python 3.12. I have it skipping now.
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Green build and tests for both build pull request and nightlies: https://github.com/facebookresearch/faiss/actions/runs/12956402963/job/36148818361

Reviewed By: asadoughi

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

Summary:

Good resource on overriding channels to make sure we aren't using `defaults`:https://stackoverflow.com/questions/67695893/how-do-i-completely-purge-and-disable-the-default-channel-in-anaconda-and-switch

Explanation of changes:
-
- changed to miniforge from miniconda: this ensures we only pull in from conda-defaults when creating the environment
- architecture: ARM64 and aarch64 are the same thing. But there is no miniforge package for ARM64, so we need to make it check for aarch64 instead. However, mac breaks this rule, and does have macOS-arm64! So there is a conditional for mac to use arm64. https://github.com/conda-forge/miniforge/releases/
- action.yml mkl 2022.2.1 change: conda-forge and defaults have completely different dependencies. Defaults required intel-openmp, but now on conda-forge, mkl 2023.1 or higher requires llvm-openmp >=14.0.6, but this is incompatible with the pytorch build <2.5 which requires llvm-openmp<14.0. We would need to upgrade Python to 3.12 first, upgrade Pytorch build, then upgrade this mkl. (The meta.yaml changes are the ones that narrow it to 2022.2.1 during `conda build faiss`.) So, this has just been changed to 2022.2.1.
- mkl now requires _openmp_mutex of type "llvm" instead of "gnu": prior non-cuVS builds all used gnu, because intel-openmp from anaconda defaults channel does not require llvm-openmp. Now we need to remove the gnu one which is automatically pulled in during miniconda setup, and only keep the llvm version of _openmp_mutex.
- liblief: The above changes tried to pull in liblief 0.15. This results in an error like `AttributeError: module 'lief._lief.ELF' has no attribute 'ELF_CLASS'`. When I checked passing PR builds on defaults, they use lief 0.12, so I pinned that one for Python 3.9 3.10 3.11. For Python 3.12, we need lief 0.14 or higher.
- gcc_linux-64 =11.2 for faiss-gpu on cudatoolkit-11.2: GPU builds kept trying to reference 11.2 when 14.2 was installed. I couldn't figure out why, or how to point it to the 14.2 installed on the host. Current nightly builds still reference 11.2, so I gave up and pinned 11.2 to keep it the same. Moving to 14.2 will take some more investigation.
- meta.yaml mkl 2023.0 vs 2023.1 with python versions: 3.9, 3.10, and 3.11 pass with 2023.0, but python 3.12 needs mkl 2023.1 or higher. Otherwise we get:
```
INTEL MKL ERROR: $PREFIX/lib/python3.12/site-packages/faiss/../../.././libmkl_def.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_def.so.2.
```
so the solution was to put a bunch of conditions in in faiss/meta.yaml.
We should be able to use Jinja macros to reduce duplication but it requires some investigation. It was failing: https://github.com/facebookresearch/faiss/actions/runs/12915187334/job/36016477707?pr=4126  (paste of logs here: P1716887936). This can be a future BE task.
Macro example (the `-` signs remove whitespace lines before and after)
```
{% macro inclmkldevel() %}
{%- if PY_VER == '3.9' or PY_VER == '3.10' or PY_VER == '3.11' -%}
        - mkl-devel =2023.0  # [x86_64]
        - liblief =0.12.3  # [not win]
        - python_abi <3.12
{%- elif PY_VER == '3.12' %}
        - mkl-devel >=2023.2.0  # [x86_64]
        - liblief =0.15.1  # [not win]
        - python_abi =3.12
{% endif -%}
{% endmacro %}
```
The python_abi was required to be pinned inside these conditions because otherwise several builds got this error:
```
File "/Users/runner/miniconda3/lib/python3.12/site-packages/conda_build/utils.py", line 1919, in insert_variant_versions
        matches = [regex.match(pkg) for pkg in reqs]
                   ^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'list'
```

Unit test notes:
-
- test_gpu_basics.py: GPU residual quantizer: Debugged extensively with Matthijs. The problem is in the C++ -> Python conversion. The C++ side prints the right values, but when getting it back to Python, it is filled with junk data. It is only reproducible on CUDA 11.4.4 after switching channels. It is likely a compiler problem. We discussed, and resolved to create a C++ side unit test (so this diff creates TestGpuResidualQuantizer) to verify the functionality and disable the Python unit test, but leave it in the codebase with a comment. Matthijs made extensive notes in https://docs.google.com/document/d/1MjMdOpPgx-MArdrYJZCaQlRqlrhSj5Y1Z9lTyiab8jc/edit?usp=sharing .
- test_contrib.py: this now hangs forever and times out the runner for Windows on Python 3.12. I have it skipping now.
- test_mem_leak.cpp seems flaky. It sometimes fails, then passes with rerun.

Unfixed issues:
-
- I noticed sometimes downloads will fail with the text like below. It passes on re-run.
```
libgomp-14.2.0-h77fa898_1.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch

  error    libmamba Error when extracting package: Could not chdir info/recipe/parent/patches/0005-Hardcode-HAVE_ALIGNED_ALLOC-1-in-libstdc-v3-configur.patch
  Warning: Found incorrect download: libgomp. Aborting

  Found incorrect download: libgomp. Aborting
  Warning:
```

Green build and tests for both build pull request and nightlies: https://github.com/facebookresearch/faiss/actions/runs/12956402963/job/36148818361

Reviewed By: asadoughi

Differential Revision: D68043874
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68043874

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 32beb16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants